62 research outputs found
Delving Deep into the Sketch and Photo Relation
"Sketches drawn by humans can play a similar role to photos in terms of conveying shape, posture as well as fine-grained information, and this fact has stimulated one line of cross-domain research that is related to sketch and photo, including sketch-based photo synthesis and retrieval. In this thesis, we aim to further investigate the relationship between sketch and photo. More specifically, we study certain under- explored traits in this relationship, and propose novel applications to reinforce the understanding of sketch and photo relation.Our exploration starts with the problem of sketch-based photo synthesis, where the unique trait of non-rigid alignment between sketch and photo is overlooked in existing research. We then carry on with our investigation from a new angle to study whether sketch can facilitate photo classifier generation. Building upon this, we continue to explore how sketch and photo are linked together on a more fine-grained level by tackling with the sketch-based photo segmenter prediction. Furthermore, we address the data scarcity issue identified in nearly all sketch-photo-related applications by examining their inherent correlation in the semantic aspect using sketch-based image retrieval (SBIR) as a test-bed. In general, we make four main contributions to the research on relationship between sketch and photo.Firstly, to mitigate the effect of deformation in sketch-based photo synthesis, we introduce the spatial transformer network to our image-image regression framework, which subtly deals with non-rigid alignment between the sketches and photos. The qualitative and quantitative experiments consistently reveal the superior quality of our synthesised photos over those generated by existing approaches.Secondly, sketch-based photo classifier generation is achieved with a novel model regression network, which maps the sketch to the parameters of photo classification model. It is shown that our model regression network is able to generalise across categories and photo classifiers for novel classes not involved in training are just a sketch away. Comprehensive experiments illustrate the promising performance of the generated binary and multi-class photo classifiers, and demonstrate that sketches can also be employed to enhance the granularity of existing photo classifiers.Thirdly, to achieve the goal of sketch-based photo segmentation, we propose a photo segmentation model generation algorithm that predicts the weights of a deep photo segmentation network according to the input sketch. The results confirm that one single sketch is the only prerequisite for unseen category photo segmentation, and the segmentation performance can be further improved by utilising sketch that is aligned with the object to be segmented in shape and position.Finally, we present an unsupervised representation learning framework for SBIR, the purpose of which is to eliminate the barrier imposed by data annotation scarcity. Prototype and memory bank reinforced joint distribution optimal transport is integrated into the unsupervised representation learning framework, so that the mapping between the sketches and photos could be automatically detected to learn a semantically meaningful yet domain-agnostic feature space. Extensive experiments and feature visualisation validate the efficacy of our proposed algorithm.
Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing
Existing works on weakly-supervised audio-visual video parsing adopt hybrid
attention network (HAN) as the multi-modal embedding to capture the cross-modal
context. It embeds the audio and visual modalities with a shared network, where
the cross-attention is performed at the input. However, such an early fusion
method highly entangles the two non-fully correlated modalities and leads to
sub-optimal performance in detecting single-modality events. To deal with this
problem, we propose the messenger-guided mid-fusion transformer to reduce the
uncorrelated cross-modal context in the fusion. The messengers condense the
full cross-modal context into a compact representation to only preserve useful
cross-modal information. Furthermore, due to the fact that microphones capture
audio events from all directions, while cameras only record visual events
within a restricted field of view, there is a more frequent occurrence of
unaligned cross-modal context from audio for visual event predictions. We thus
propose cross-audio prediction consistency to suppress the impact of irrelevant
audio information on visual event prediction. Experiments consistently
illustrate the superior performance of our framework compared to existing
state-of-the-art methods.Comment: WACV 202
Generalized Few-Shot Point Cloud Segmentation Via Geometric Words
Existing fully-supervised point cloud segmentation methods suffer in the
dynamic testing environment with emerging new classes. Few-shot point cloud
segmentation algorithms address this problem by learning to adapt to new
classes at the sacrifice of segmentation accuracy for the base classes, which
severely impedes its practicality. This largely motivates us to present the
first attempt at a more practical paradigm of generalized few-shot point cloud
segmentation, which requires the model to generalize to new categories with
only a few support point clouds and simultaneously retain the capability to
segment base classes. We propose the geometric words to represent geometric
components shared between the base and novel classes, and incorporate them into
a novel geometric-aware semantic representation to facilitate better
generalization to the new classes without forgetting the old ones. Moreover, we
introduce geometric prototypes to guide the segmentation with geometric prior
knowledge. Extensive experiments on S3DIS and ScanNet consistently illustrate
the superior performance of our method over baseline methods. Our code is
available at: https://github.com/Pixie8888/GFS-3DSeg_GWs.Comment: Accepted by ICCV 202
End-To-End Semi-supervised Learning for Differentiable Particle Filters
Recent advances in incorporating neural networks into particle filters
provide the desired flexibility to apply particle filters in large-scale
real-world applications. The dynamic and measurement models in this framework
are learnable through the differentiable implementation of particle filters.
Past efforts in optimising such models often require the knowledge of true
states which can be expensive to obtain or even unavailable in practice. In
this paper, in order to reduce the demand for annotated data, we present an
end-to-end learning objective based upon the maximisation of a
pseudo-likelihood function which can improve the estimation of states when
large portion of true states are unknown. We assess performance of the proposed
method in state estimation tasks in robotics with simulated and real-world
datasets.Comment: Accepted in ICRA 202
Sketch-based Video Object Segmentation: Benchmark and Analysis
Reference-based video object segmentation is an emerging topic which aims to
segment the corresponding target object in each video frame referred by a given
reference, such as a language expression or a photo mask. However, language
expressions can sometimes be vague in conveying an intended concept and
ambiguous when similar objects in one frame are hard to distinguish by
language. Meanwhile, photo masks are costly to annotate and less practical to
provide in a real application. This paper introduces a new task of sketch-based
video object segmentation, an associated benchmark, and a strong baseline. Our
benchmark includes three datasets, Sketch-DAVIS16, Sketch-DAVIS17 and
Sketch-YouTube-VOS, which exploit human-drawn sketches as an informative yet
low-cost reference for video object segmentation. We take advantage of STCN, a
popular baseline of semi-supervised VOS task, and evaluate what the most
effective design for incorporating a sketch reference is. Experimental results
show sketch is more effective yet annotation-efficient than other references,
such as photo masks, language and scribble.Comment: BMVC 202
Sketch-a-Classifier: Sketch-based Photo Classifier Generation
Contemporary deep learning techniques have made image recognition a
reasonably reliable technology. However training effective photo classifiers
typically takes numerous examples which limits image recognition's scalability
and applicability to scenarios where images may not be available. This has
motivated investigation into zero-shot learning, which addresses the issue via
knowledge transfer from other modalities such as text. In this paper we
investigate an alternative approach of synthesizing image classifiers: almost
directly from a user's imagination, via free-hand sketch. This approach doesn't
require the category to be nameable or describable via attributes as per
zero-shot learning. We achieve this via training a {model regression} network
to map from {free-hand sketch} space to the space of photo classifiers. It
turns out that this mapping can be learned in a category-agnostic way, allowing
photo classifiers for new categories to be synthesized by user with no need for
annotated training photos. {We also demonstrate that this modality of
classifier generation can also be used to enhance the granularity of an
existing photo classifier, or as a complement to name-based zero-shot learning.Comment: published in CVPR2018 as spotligh
A nomogram-based optimized Radscore for preoperative prediction of lymph node metastasis in patients with cervical cancer after neoadjuvant chemotherapy
PurposeTo construct a superior single-sequence radiomics signature to assess lymphatic metastasis in patients with cervical cancer after neoadjuvant chemotherapy (NACT).MethodsThe first half of the study was retrospectively conducted in our hospital between October 2012 and December 2021. Based on the history of NACT before surgery, all pathologies were divided into the NACT and surgery groups. The incidence rate of lymphatic metastasis in the two groups was determined based on the results of pathological examination following lymphadenectomy. Patients from the primary and secondary centers who received NACT were enrolled for radiomics analysis in the second half of the study. The patient cohorts from the primary center were randomly divided into training and test cohorts at a ratio of 7:3. All patients underwent magnetic resonance imaging after NACT. Segmentation was performed on T1-weighted imaging (T1WI), T2-weighted imaging, contrast-enhanced T1WI (CET1WI), and diffusion-weighted imaging.ResultsThe rate of lymphatic metastasis in the NACT group (33.2%) was significantly lower than that in the surgery group (58.7%, P=0.007). The area under the receiver operating characteristic curve values of Radscore_CET1WI for predicting lymph node metastasis and non-lymphatic metastasis were 0.800 and 0.797 in the training and test cohorts, respectively, exhibiting superior diagnostic performance. After combining the clinical variables, the tumor diameter on magnetic resonance imaging was incorporated into the Rad_clin model constructed using Radscore_CET1WI. The Hosmer–Lemeshow test of the Rad_clin model revealed no significant differences in the goodness of fit in the training (P=0.594) or test cohort (P=0.748).ConclusionsThe Radscore provided by CET1WI may achieve a higher diagnostic performance in predicting lymph node metastasis. Superior performance was observed with the Rad_clin model
Chronic disease prevention literacy and its influence on behavior and lifestyle: a cross-sectional study in Xinjiang, China
Abstract Objective To understand the status and influencing factors of Kyrgyz chronic disease prevention literacy, and to explore the impact of chronic disease prevention literacy on behavior and living habits. Method Using stratified sampling method, Kyrgyz residents aged ≥ 18 years in Artush City, Aheqi County and Ucha County were surveyed by questionnaire. Results A total of 10,468 subjects were investigated, and the literacy rate of chronic disease prevention in Kyrgyz was 11.2%. The results of Logistic regression analysis showed that the literacy rate of chronic disease prevention was low among people with low education level, herdsmen, low income, urban and chronic disease (P < 0.05). Residents with chronic disease prevention literacy were more inclined to not smoke, not drink alcohol, drink milk every day, eat soy products every month, eat whole grains every day (P < 0.05). Conclusion The literacy level of chronic disease prevention of Kyrgyz residents in Kezhou has been improved, but it is still at a low level compared with another subcategories. The behavioral lifestyle is related to the literacy level of chronic disease prevention. Therefore, local health promotion strategies should be developed to improve the literacy level of chronic disease prevention and promote the formation of good behavioral and living habits
- …